Search CORE

28 research outputs found

Accelerating Object-Sensitive Pointer Analysis by Exploiting Object Containment and Reachability (Artifact)

Author: Gao Yaoqing
He Dongjie
Lu Jingbo
Xue Jingling
Publication venue: DARTS - Dagstuhl Artifacts Series. DARTS, Volume 7, Issue 2, Special Issue of the 35th European Conference on Object-Oriented Programming (ECOOP 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

MLGOPerf: An ML Guided Inliner to Optimize Performance

Author: Ashouri Amir H.
Chan Bryan
Elhoushi Mostafa
Gao Yaoqing
Hua Yuzhe
Manzoor Muhammad Asif
Wang Xiang
Publication venue
Publication date: 19/07/2022
Field of study

For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed by the user. MLGO was among the first of such projects and it only strives to reduce the code size of a binary with an ML-based Inliner using Reinforcement Learning. This paper presents MLGOPerf; the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent, previously used as the primary model by MLGO. It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model which otherwise wouldn't be practical. The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with respect to LLVM's optimization at O3 when trained for performance on SPEC CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach provides up to 26% increased opportunities to autotune code regions for our benchmarks which can be translated into an additional 3.7% speedup value.Comment: Version 2: Added the missing Table 6. The short version of this work is accepted at ACM/IEEE CASES 202

arXiv.org e-Print Archive

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

Accelerating Object-Sensitive Pointer Analysis by Exploiting Object Containment and Reachability

Author: Gao Yaoqing
He Dongjie
Lu Jingbo
Xue Jingling
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th European Conference on Object-Oriented Programming (ECOOP 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

A survey of implementations of concurrent, parallel and distributed Smalltalk

Author: Agha Actors
Akinori Yonezawa ABCL
Chung Kwong Yuen
Eddy Bledoeg The
Gao Yaoqing C.K.
Gao Yaoqing Shen Meiming
Lee J.H.
Li Jing Wei Gao Yaoqing
Yaoqing Gao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Speculative Parallelism Improves Search?

Author: T. A. Marsland
Yaoqing Gao
Publication venue
Publication date
Field of study

The extreme efficiency of sequential search, and the natural tendency of tree pruning systems to produce wide variations in workload, partly explains why it is proving difficult to achieve more than 30-50 % efficiency for massively parallel implementations of the; algorithm. Here we introduce typical enhanced sequential algorithms and address the major issues of parallel game-tree searching under conditions of severe pruning. It is this pruning that makes the parallelization difficult. After examining previous work on parallel; algorithms, we present a new method called Dynamic Multiple Principal Variation Splitting (DM-PVSplit) and implement it on the AP1000. In this algorithm, high performance is achieved by using some novel approaches: Parallel speculative search of candidate principal variations is used to reduce re-search delay and so obtain more quickly a better estimate of the subtree value. This is achieved by configuring a at processor arrangement as a dynamically changeable tree structure. Also, with the aid of a group-based scheduling strategy, the game tree is split dynamically at different levels. This provides better load balance and takes more advantage of parallelism. Preliminary experiments show that the scalability of the DM-PVSplit algorithm is good for massively parallel machines

CiteSeerX

Multithreaded Pruned Tree Search In Distributed Systems

Author: T. A. Marsland
Yaoqing Gao
Publication venue
Publication date
Field of study

Although efficient support for data-parallel applications is relatively well established, it remains open how well to support irregular and dynamic problems where there are no regular data structures and communication patterns. Tree search is central to solving a variety of problems in artificial intelligence and an important subset of the irregular applications where tasks are frequently created and terminated. In this paper, we introduce the design of a multithreaded distributed runtime system. Efficiency and ease of parallel programming are the two primary goals. In our system, multithreading is used to specify the asynchronous behavior in parallel game tree search, and dynamic load balancing is employed for efficient performance

CiteSeerX

IoP System Dependability Evaluation Method Based on AADL

Author: Gao Yaoqing
Mian Zhibao
Shi Xiaodong
Publication venue: Shanghai Jisuanji Xuehui/Shanghai Computer Society
Publication date: 03/01/2022
Field of study

The Internet of People(IoP)is characterized by the complex architecture and massive changing data, which adds to the difficulty of the analysis on IoP-based system dependability.Currently, there is still no robust dependability modelling and analysis method for IoP systems. This paper proposes an Architecture Analysis and Design Language (AADL)-based dependability evaluation method for IoP systems. By using AADL and its annex language, the dependability of IoP systems is modeled to support the qualitative analysis on the causes of system failures and risks. Furthermore, by combining the Ocarina model transformation technology, a quantitative evaluation algorithm based on the Continuous-Time Markov Chain(CTMC)is proposed. The algorithm transforms the AADL dependability model to the CTMC model, so that the dynamic and real-time attributes of IoP systems can be evaluated quantitatively. On this basis, a general IoP system model is designed to demonstrate the feasibility of the proposed method. The experimental results show that the proposed method can be used to model the IoP systems, and perform dependability analysis automatically and accurately, displaying a high application value

Repository@Hull - Worktribe

Delta Send-Recv for Dynamic Pipelining in MPI Programs

Author: Bin Bao
Chen Ding
Roch Archambault
Yaoqing Gao
Publication venue
Publication date
Field of study

Abstract—Pipelining is necessary for efficient do-across parallelism but the use is difficult to automate because it requires send-receive analysis and loop blocking in both sender and receiver code. The blocking factor is statically chosen. This paper presents a new interface called delta sendrecv. Through compiler and run-time support, it enables dynamic pipelining. In program code, the interface is used to mark the related computation and communication. There is no need to restructure the computation code or compose multiple messages. At run time, the message size is dynamically determined, and multiple pipelines are chained among all tasks that participate in the delta communication. The new system is tested on kernel and reduced NAS benchmarks to show that it simplifies message-passing programming and improves program performance. Keywords-MPI; communication-computation overlapping; dynamic pipelining I

CiteSeerX

Crossref